Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): do not delete the temp commit file when write_commit_entry returns error #2446

Closed
wants to merge 1 commit into from

Conversation

PeterKeDer
Copy link
Contributor

Description

Do not delete the temp commit file when write_commit_entry returns an error.

This fixes a rare bug when using DynamoDB as the locking provider. Consider the following events:

  1. S3DynamoDbLogStore.write_commit_entry succeeds at creating a DynamoDB entry, but fails during repair_entry
  2. The error handling deletes the tmp_commit file
  3. The next time repair_entry is called on this entry, it will lead to ObjectStoreError::NotFound since the temp commit file was deleted
  4. We infer the temp commit json was already moved, so we update the DynamoDB entry as complete

This results in a complete DynamoDB log entry, even though the corresponding transaction log does not exist in _delta_log.

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Apr 23, 2024
@ion-elgreco
Copy link
Collaborator

Not everyone uses dynamodb, so this will cause tmp log entries to be left around for adls and GCS.

You should make the change in the S3 dynamodb log store implementation to be more robust. If the object is not found it should not fail and just pass

@PeterKeDer
Copy link
Contributor Author

Hey @ion-elgreco, thanks for your suggestions. Could you elaborate on what you mean by this?

If the object is not found it should not fail and just pass

Happy to discuss other options. I'm not sure if there is a way to make the S3 DynamoDB log store more robust if we delete the temp commit file, since the DynamoDB entry becomes invalid and we won't be able to repair it.

What are your thoughts on returning TransactionError::LogStoreError with a specific message (or maybe a new TransactionError?) when we succeed at writing a DynamoDB entry but fail to move the file? Then we can handle this specific error by not deleting the temp commit file.

@PeterKeDer
Copy link
Contributor Author

Closing this PR in favor of #2452

@PeterKeDer PeterKeDer closed this Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants